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SUMMARY 


Efforts to develop evaluation methods for fuzzy inference systems which are not based on crisp, quantitative data 
or processes (i.e., where the phenomenon the system is built to describe or control is inherently fuzzy) are just 
beginning. This paper suggests that the method of fuzzy least squares can be used to perform such evaluations. 
Regressing the desired outputs onto the inferred outputs can provide both global and local measures of success. 
The global measures have some value in an absolute sense but are particularly useful when competing solutions 
(e.g., different numbers of rules, different fuzzy input partitions) are being compared. The local measure 
described here can be used to identify specific areas of poor fit where special measures (e.g. the use of emphatic 
or suppressive rules) can be applied. Several examples are discussed which illustrate the applicability of the 
method as an evaluation tool. 


INTRODUCTION 


Smith and Comer [1] point out that evaluation of the behavior of a fuzzy system can be quite difficult. They also 
mention (p. 20) that the qualitative knowledge of the controller designer is more suited to accurate specification of 
the antecedent portions of the control rules that to accurate specification of the consequent portions. This is 
because (presumably and at least in part) the role of the input variables in system dynamics is more easily 
understood in general, and also because the input variables are often more directly and more easily expressible in 
fuzzy (linguistic) terms (e.g., temperature as high, medium, and low). This is perhaps even more true in "softer" 
areas like psychology and sociology, where "harder" inputs like age and socioeconomic status are used to control 
(predict) softer outputs like behavior or risk (for interesting comments along these lines in the context of fuzzy 
classification see [2]). In fact, the very foundations of some methods of analysis and prediction used in these soft 
areas, especially classical least squares, are predicated upon input variables whose values are assumed to be error- 
free measurable (see e.g. [3], Section 1.1). 

Methods for the evaluation and tuning of fuzzy systems do not really challenge this assumption; they typically 
assume that the designer has the input distributions about right and then adjust formal "parameters" of the 
inference mechanism to improve controller performance. Again, this works well in hard areas but should prove 
difficult to apply in enraging softer applications where there is no aspect of the inference process that can be 
trusted completely. It becomes important, therefore, in soft applications, to have some way of evaluating the 
accuracy and effectiveness of a fuzzy inference system which assumes as little as possible about the validity of the 
rules, and even of their essential characteristics, beyond the linguistic properties they express. Furthermore, there 
may often be no real way of knowing whether interpolated consequent fuzzy values (values not supplied directly 
by an expert) are accurate to the point where they can serve to confirm the chosen system and parameters. It 
should prove useful, therefore, to have available methods which can provide overall evaluation measures given 
certain assumptions about the structure and regularity of the output (consequent) fuzzy distributions. 

Perhaps the most well-characterized and formalized methods for the evaluation and tuning of fuzzy controllers are 
those based on the concept of cell mapping [1, 4-5]. Nonetheless, the application of cell mapping to evaluation 
and tuning depends crucially on the existence of sufficient crisp input-output pairs to generate the cell maps 
(actually, this is a bit of an oversimplification - see [5], pp. 749-750), and also provides no real way to 
distinguish between competing fuzzifications of the input state space (unless of course the fuzzification is so bad 
that tuning is impossible). This paper suggests that an evaluation based on fuzzy least squares can indeed 
distinguish between competing input state space fuzzifications and can be used (quite easily) in cases where 
neither the input nor the output is readily defuzzifiable. 
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FUZZY LEAST SQUARES 


The method of fuzzy least squares was introduced by Diamond [6] as an approach to the fuzzy regression 
problem, i.e. as a method for parameterizing the relationship between two sets of fuzzy numbers; its advantage 
over other techniques (besides computational simplicity), as Diamond points out, is the amenability of the 
parameterization to evaluation by standard measures, e.g., examination of residuals. For the purpose at hand, it is 
particularly important that the spatial geometry of the fuzzy least squares method be understood; to accomplish 
this goal, we turn briefly to crisp models. 

Basically, the solutions to linear parameter estimation problems as well as their computational simplicity depend 
heavily on assumptions regarding which measurements may be considered to be error-free and which 
measurements may not. If either the independent or the dependent variable measurements are taken to be error- 
free, then ordinary least squares may reasonably be applied to the data. In such cases, the error (residual) vectors 
are orthogonal to the axis (or axes) along which the error-free values are measured. If, on the other hand, both 
dependent and independent variable measurements must be assumed to be made with error, the parameter 
estimation problem becomes considerably more difficult (even analytically intractable in the general case). In any 
case, if a solution can be generated, the error vectors will be orthogonal to the fitted line itself (the first chapter of 
[3] contains an excellent summary and relevant examples). 

In extreme cases, especially those in which the data points are contaminated by outliers, the differences in the 
various solutions may be striking, as is illustrated in the figure below (from (7]). If the x coordinates are assumed 
to be error-free and a line is fitted by the method suggested in [7] (not ordinary least squares but equivalent for 
the present purpose), then errors orthogonal to the x axis are minimized by a fitted line which passes through the 
outlier (the point at 0,0). This is clearly a most undesirable solution. If both the x and y coordinates are assumed 
to contain errors, on the other hand, (even isotropic ones), the method yields a much more reasonable fitted line 
(the one parallel to the y axis). 
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To return to fuzzy considerations, the point is that the method of fuzzy least squares, despite its "ordinary least 
squares" character, is more closely related (in spirit, as it were) to fitting (regression) approaches in which both 
dependent and variables are measured with error. It should be emphasized, however, that this is not true from a 
purely analytic point of view. Once a distance metric is decided upon, and once the hypergeometric characteristics 
of the set of triangular fuzzy numbers are established, the fuzzy least squares parameter vector is derived by an 
orthogonal projection of the dependent variable vector onto the "cone" of potential solutions exactly as in ordinary 
least squares (Diamond's paper [6], pp. 142-146 contains an elegant exposition of these facts, and section 2.3 of 
[3] contains highly instructive comments and diagrams in a crisp context). Thus, from an analytical point of view, 
though both the independent and dependent variable vectors are fuzzy, one is assumed to be measured without 
error while the other is not. 
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From another point of view, however, fuzzy least squares is more like a "total least squares" approach [3] in 
which both the dependent and independent vectors (or matrices) are assumed to be measured with errors. This is 
because the fuzzy least squares method, with its two separate spatial components (mode and spread), permits the 
search for a solution vector to move about a more complex (and hence more flexible) space (in effect, of course, 
since the solution is derived analytically). The result of this is that fuzzy least squares can preserve an extremely 
good fit in fuzziness even if, for some reason, one or more values in the data are outliers relative to mode. Since 
the fuzziness of the dependent and independent variables, taken together, are a measure of the overall uncertainty 
of the system, this characteristic has the effect of preserving the degree of overall uncertainty in a manner similar 
to total least squares methods. 

FXJZZY LEAST SQUARES AND FUZZY INFERENCE 

It would surely be instructive to pursue the analogy between fuzzy least squares and total least squares further and 
more formally, but that would take us far beyond the scope of this paper. It is worth mentioning, though, by way 
of leaving the previous topic and beginning the current one, that Diamond's fuzzy least squares minimization 
condition (1) could conceivably be replaced to good effect by (2), where minimization of the square of the 
distances between the measured (Yj) and calculated (E + bXj) is replaced by minimization of some scalar norm of 

the "total error” matrix ([.]) and where Xq is the unobservable "true" vector of fuzzy predictors (see [3], p.186 
and p. 23). If the Frobenius norm were 

X d ( E + bx i , Y i ) 2 ( 1 ) 

F { E , b , X Q ) = [d(X,X Q );d(E + bX,Y)] (2) 

used, a solution to (2) would be equivalent to a solution of the "fuzzy total least squares" minimization function 
(cf. [3], p. 186). 

Be all of this as it may, it seems fair to conclude that fuzzy least squares is a relatively "robust" form of regression 
which is eminently suitable for parameterizing the relationship between two n-dimensional fuzzy vectors with 
elements of regular shape (at least triangular and trapezoidal [6]). The vectors being compared do not necessarily 
have to be particularly "linear", though they must at least be "coherent" ([6], pp. 150-151); vectors produced as 
result of fuzzy inference are as likely to be coherent as not, one would imagine, but the condition is easily tested 
for [6], so inference systems which do not produce coherent output should simply not be subjected to the 
evaluation procedures suggested here. 

Fuzzy least squares, then, forms the basis for a simple evaluation technique for fuzzy inference systems. Given 
two possible solutions, regress the known (fuzzy) output (the "correct" values) on the output fuzzy sets generated 
by the two inference processes. Compare the two solutions via any of many available evaluation methods, and 
keep the one which evaluates higher. Certain evaluation methods may even suggest ways in which the better 
solution can be improved. Space does not permit further general discussion, so we conclude by introducing a few 
evaluation measures and by providing examples of their use. It is worth noting at this point that the calculations 
needed to perform fuzzy least squares and to compute the evaluation measures are straightforward and can be 
performed with minimal computational overhead. It is also worth noting that it is may be possible to extend the 
domain of this method to inference systems which do not produce fuzzy "numerical" output by "fitting" fuzzy 
numbers over the fuzzy sets by linear interpolation as is done in fuzzy modeling (see, e.g. (8]), but this matter 
will not be pursued here. 

EVALUATION MEASURES 

1. GLOBAL MEASURES. The most obvious global measure of success are the least squares residuals. A related 
value which varies conveniently between 0 (no correlation) and 1 (perfect correlation) is the correlation 
coefficient. For generality, we define (see [9], p.280) the fuzzy multiple correlation coefficient (MCC) as 
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M C C 


Id(£ + bX. ( Y ) 2 k 

= < v =1 > < 3 > 

L d ( y ±> Y ) 


where d again is the distance between two fuzzy numbers [6] and where Y is the mean dependent fuzzy value, 
even though all examples in this paper are univariate and extensions to the multivariate case are non-trivial ([6], p. 
156). 

Another useful global measure of success is the relative entropy of the fuzzy least squares solution as defined in 
[10]. This form of relative entropy is a measure of the success of the regression "line" in tracking the fuzziness of 
the elements of the dependent variable vector. It is defined as (see [ 10] for a detailed description and rationale): 
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spread (y £) ' spread ( y ^ ) 


)) 


(4) 


and where yj is the estimated y; (i.e., E + bXj). 


2. LOCAL MEASURES. The only local evaluation method discussed here will be the weighted squared 
standardized distance [11-12], In the univariate case, the WSSD can be written as: 


WSSDi 


(n - 1 )b 2 . d(x lt X ) 2 



(5) 


where X and Y are the X and Y means 


where b is the regression coefficient, and where d again is the fuzzy distance. In ordinary least squares regression, 
the magnitude of WSSDj is used to determine whether or not point i is a "high-leverage point", i.e., a point in a 

sparse region of the X-space (see, e.g.', [12], pp. 94 ff.). We are interested here in the WSSD because a fuzzy 
inference tends to produce similar or identical output when the inference mechanism operates near the centers of 
the involved fuzzy sets and to produce rapidly changing output as the inference mechanism operates near areas of 
overlap (and thus near areas of heavy interpolation). A good inference mechanism should produce transitional 
areas in its output which correspond to areas of overlap in the output data partition. Thus, the output vector 
produced by a fuzzy inference should have clusters of similar or identical values which match the reference values 
near the centers of the elements of the output reference partition, and rapid changes in value which match the 
reference values in and near the overlap areas of the output reference partition. This phenomenon will produce 
clusters of points with similar or identical leverage in the regression followed by points with unique leverage 
values (at the transitional areas). In a good model, then, the clusters and transitions in WSSD values will line up 
nicely with the centers and overlap areas of the output reference partition respectively. 

A NOTE ON "PIECEWISE" APPROXIMATIONS 

It is important to note that this paper is not suggesting that fuzzy least squares is to be used to construct an 
accurate "piecewise" approximation to some unknown "functional relationship" between input and output fuzzy 
sets. To understand better what is being suggested, consider a fuzzy Lagrangian interpolation polynomial which 
relates the true output fuzzy sets and the ones generated by the inference (as in [13] with n + 1 fuzzy points). As 
with crisp Lagrangian interpolation, such a polynomial could be used, for instance, to compute error bounds 
(using contour integrals in the complex plane [14]) if we knew the "true" functional relationship between the 
actual output fuzzy sets and the generated ones; such a relationship may not exist, of course, in the general case 
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and in the usual sense of the word functional, but would in any event depend on the accuracy of the inference 
process as an interpolator and "smoother". The least squares regression line, then, serves in this context as a crude 
continuous approximation to some presumably nonlinear and possibly unrecoverable "difference" function. 

EXAMPLES 

All of the examples discussed here are based on examples from Cao and Kandel [15]. Since their examples are 
based on crisp input and output (rotational speed of a d.c. series motor as a function of input current), the output 
was "refuzzified" as described below so that fuzzy regression could be applied. The data sets in our examples, 
therefore, are not "inherently" fuzzy, but they do have the advantage of being associated with thoroughly 
analyzed models which are easy to evaluate for accuracy (in a crisp sense). Note also that as was mentioned 
earlier, the notion of "reshaping" the output of a fuzzy inference process so that it can be analyzed by fuzzy least 
squares is not an unreasonable one, though of course for useful application it would require more elaborate 
methods than the one used here. 

1. The "model" curve of Example 7 in [15] is a connected piecewise linear curve of five segments with overall 
rising trend. The model curve is "covered" by the five overlapping fuzzy sets shown in Figure 1. Assuming that 
this consequent set representation is reasonable, the fuzziest areas of coverage (i.e., the areas of maximal overlap) 
are those around the output values 800, 1400, and 1800 (800 because the rules do not reference the second set 
(SMALL)). Ideally, the inference system should map the corresponding input values (1.0, 3.0, and 7.0) into these 
same transition areas. Cao and Kandel cover the input range by six overlapping fuzzy sets; we use the WSSD, 
MCC, and relative entropy to compare their six input set partition with a four input set partition and an eight 
input set partition. The rules in the four and eight input set cases are adjusted to conform insofar as is possible 
with the content of Cao and Kandel's original (six input set) rules. The crisp output data and the crisp inferred 
output data were fuzzified by using 10%, 15%, or 20% of the mode as the left and right spreads, increasing the 
percentage as the numbers got larger; in this manner, a reasonably coherent output data set and inferred output 
data set were constructed. The rules themselves are as follows (in each case the input domain is distributed equally 
among the component sets): 


NULL -> ZERO 

NULL -> ZERO 

NULL - > ZERO 

ZERO-SMALL -> MEDIUM 

ZERO -> MEDIUM 

ZERO - > MEDIUM 

SMALL-MEDIUM -> LARGE 

SMALL -> LARGE 

ZERO-SMALL -> MEDIUM 

LARGE -> VERY LARGE 

MEDIUM - > LARGE 

SMALL -> LARGE 


LARGE -> VERY LARGE 

SMALL-MEDIUM - > LARGE 


VERY LARGE -> VERY LARGE 

MEDIUM-LARGE -> VERY LARGE 



LARGE -> VERY LARGE 



VERY-LARGE -> VERY LARGE 


As Table 1 shows, good results were obtained when the fuzzified inferred values were regressed on the fuzzified 
output data (the table shows only the crisp values, i.e., the modes of the fuzzified values). The transition points 
match nicely, the MCC is high, and the relative entropy is low (of course, the MCC and entropy values are most 
meaningful when compared with other prospective solutions). When only four antecedent sets are used, however, 
the results suffer dramatically. The transition points miss the mark by a considerable margin, the MCC is lower, 
and the relative entropy is higher. With eight antecedent sets results are better but still not as good as with six (it 
is important to note here that overlap was retained at 50%). If one had started with the four or eight set inference 
machine, the lack of matchups in the transition areas would have been a clue that the results could be improved 
upon. It is worth noting that the relative magnitudes of the fuzzy constants are a decent guide to the relative merits 
of the various models. Figures 1, 2, and 3 show the distributions of the crisp output values relative to the output 
set and to the covering fuzzy sets (the fuzzy partition) for the consequent portions of the inference rules; note that 
only the six antecedent set solution produces distinct transition values in the vicinity of the transition regions of 
the output fuzzy partition, and that this fact is reflected in the WSSD values. For details of the membership 
functions, input and output data, and the rules themselves refer to [15]. 
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2. The model curve of Example 3 in [IS] is a piecewise linear curve with two trend shifts. For this example, we 
flattened the bottom and shifted the second peak left to conform with the output of an eight antecedent fuzzy set 
approximation. As can be seen from the results below (and as would be expected), the eight-antecedent model 
yields better statistics. Nevertheless, the six-antecedent model conforms better to the transition points (not shown, 
but fairly obvious from an inspection of Figures 4 and S). This suggests that the flattened area might be better 
approximated by emphasizing the appropriate rule in the rule set [IS] and retaining the six antecedent fuzzy sets 
(note that to do this it is necessary to switch from max-min to product-sum inference - see [16]). As can be seen 
from the third column of values in Table 2, this hypothesis proves correct - there is little difference between the 
eight-antecedent results and the six-antecedent results with emphasis, and the six-antecedent version is truer 
through the transitions. If the second peak is shifted back to its original spot, in fact, the six-antecedent version 
with emphasis is better on all statistics. Note again that the magnitude of the fuzzy constant is a good indication of 
the relative merits of the various models. The rules are as follows: 


NULL -> VERY LARGE 

NULL - > VERY LARGE 

NULL -> VERY LARGE 

ZERO -> MEDIUM 

ZERO - > MEDIUM 

ZERO - > MEDIUM 

SMALL -> ZERO 

ZERO-SMALL -> ZERO 

SMALL -> ZERO 

MEDIUM -> MEDIUM 

SMALL -> ZERO 

(repeat above for emphasis) 

LARGE - > VERY LARGE 

SMALL-MEDIUM -> MEDIUM 

(repeat above for emphasis) 

VERY LARGE -> MEDIUM 

MEDIUM- LARGE -> VERY LARGE 

MEDIUM -> MEDIUM 


LARGE -> LARGE 

LARGE -> VERY LARGE 


VERY-LARGE -> MEDIUM 

VERY-LARGE - > MEDIUM 


TABLE 2: RESULTS FOR EX. 3 OF CAO AND KANDEL WITH BOTTOM FLATTENED AND ONE PEAK 

SHIFTED 


6 ANT. SETS 

8 ANT. SETS 

6 ANT. SETS | 

INFERENCE TYPE 

MAX-MIN 

MAX-MIN 

PROD-SUM 

MCC 

0.896 

0.972 

0.969 

REL. ENTROPY 

7.68 

4.97 

5.76 

FOR 6 ANTECEDENT SETS MM Y = 1.06X - (126.10, 16.29, 16.29) 

FOR 8 ANTECEDENT SETS MM Y = (55.78, 7.85, 7.85) + 0.97X 

FOR 6 ANTECEDENT SETS PS Y = (79.82, 14.16, 14.16) + 0.95X 
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FIG. -4s SIX ANTECEDENT FUZZY SETS 



FIG- 5: EIGHT ANTECEDENT FUZZY SETS 



I VALUES 


3. In this example we return to the data of Example 7 in [15], but we add a bubble to the line at input values 2 to 
3. As we emphasize the rule which raises the output values in that area (SMALL -> LARGE), first once and then 
twice, we observe corresponding improvement in the results. This improvement is obvious in the figures below, 
and is also tracked nicely once again by the statistics. Note that only the "double emphasis" inference creates a 
transition point in WSSD values in the center of the bubble. Since the effect of emphasis is essentially to shift a 
transition point toward the emphasized region, this is a sign that the input and output data sets are a good match. 
As an illustration of the value of the WSSD, we modified the single emphasis inference results so that just the 
spreads matched better in the bubble. Note that, as one might expect, this improves the overall least squares 
solution, but note also that this creates a WSSD transition point in the proper place. Since this would not be 
apparent from an inspection of the modes alone, the value of the WSSD to a detailed evaluation of the inference 
results is clear. 
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TABLE 3: RESULTS FOR EXAMPLE 7 OF CAO AND KANDEL 
WITH BUBBLE ADDED 


STATISTIC DOUBLE EM SINGLE EM II NO EMPHAS 


SINGL EM + 


MCC 

0.967 

0.9565 

0.896 

0.9566 

ENT 

3.90 

4.45 

6.48 

4.27 

WSSD 3.5 

0.022243 

0.026888 

0.071660 

0.027036 

WSSD 3.0 

0.022243 

0.026888 

0.133615 

0.027036 

WSSD 2.5 

0.089423* 

0.398910 

1.005964 

0.378739* 

WSSD 2.0 

0.438807 

0.398910 

1 .005964 

0.400629 

WSSD 1.5 

0.438807 

0.398910 

1.005964 

0.400629 


FOR DOUBLE EMPHASIS Y = (163.27, 24.75, 24.75) + 0.91X 


FOR SINGLE EMPHASIS Y = (199.36, 30.17, 30.17) + 0.89X 


FOR NO EMPHASIS Y = (503.50, 77.23, 77.23) + 0.73X 


FOR SINGLE EMPHAS. + Y = (198.51, 28.33, 28.33) + 0.89X 


+ DIFFERS FROM SINGLE EMPHASIS ONLY IN FUZZINESS OF 
VALUES IN BUBBLE (BETTER MATCH) 


6: EX7 UITH BUBBLE FROM 2 TO 3 


SOLID LINE - CRD RND KflNDEL 
DOTTED LINE - THE BUBBLE 
DRSHED LINE - PROD-SUM RPPROX. 


NO EMPHASIS 

1 1 — 

4.0 B.O 

I VRLUES 


FIG. 7s EX7 UITH BUBBLE FROM 2 TO 3 


SOLID LINE - OHO RND KRNDEL 
DOTTED LINE - THE BUBBLE 
DRSHED LINE - PROD-SUM RPPROX. 

SINGLE EMPHASIS 


4.d e 

I VPLUES 
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FIG. 8s EX7 WITH BUBBLE FROM 2 TO 3 
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